

ISSN 2348 - 8034 Impact Factor- 4.022

# GLOBAL JOURNAL OF ENGINEERING SCIENCE AND RESEARCHES BEC BASED FFT ARCHITECTURE USING CORDIC ALGORITHM

**R.** Merlin Princy<sup>\*1</sup> & S. Arun Kumar<sup>\*2</sup>

<sup>\*1</sup>PG Scholar, Department of ECE, PSNACET, Dindigul, India <sup>\*2</sup>Assistant Professor, Department of ECE, PSNACET, Dindigul, India.

#### ABSTRACT

Digital signal processing method uses non-linear functions such as Discrete cosine transform and Discrete wavelet transfom.Because it is accomplished by repetative application of addition and multiplication,the speed of multiplication and addition arithmetic determines the execution speed and performance of the entire speed. Because the multiplier requirelongest delay in operational blocks. This paper presents multiplier and accumulator (MAC) architecture is used for high speed arithmetic computation in Binary FFT using selective rotation.By combining multiplication with accumulation and devising a type of carry save adder (CSA).The accumulator has largest delay in MAC was merged into CSA ,and the performance was evaluated.The proposed CSA uses Booth algorithm and it has modified for sign extension in order to increase the bit density of the operand.The CSA propagates least significant bits (LSB) to reduce the number of input bits in adder.MAC accumulates the intermediate results instead of output of the final adder. The proposed architecture is based on the radix-2 FFT algorithm only half of the samples at each must be rotated. This method is habituated to reduce a computation cost,area and computational complexity.Develop the binary version of FFT, in order to reduce processing overhead.This architecture has been coded in Verilog and simulated by using Xilinx 12.1.

## I. INTRODUCTION

The Fast Fourier Transform (FFT) is a way to calculate the Discrete Fourier Transform (DFT) as quickly. It reduces the amount of computation time, Area and increase the speed. It decomposes the set of data to be transformed into a series of smaller data sets to be transformed. It is widely used for many applications such as Medical electronics, Signal Processing, Image Processing, Animation. The Fast Fourier transform algorithm computes the Discrete Fourier Transform of a sequence or its inverse. Fourier analysis converts a signal from its original domain to a representation in the frequency domain and vice versa. It is simply a fast way to calculate the Discrete Fourier Transform (DFT). Greatly it reduces the amount of calculation required. It decomposes the set of data to be transformed into a series of smaller data sets to be transformed. The "radix" is the size of an FFT decomposition. "Twiddle factors" are the coefficients used to combine results from a previous stage to form inputs to the next stage. one of the key components in arithmetic circuits, adders have been used in the digital electronics. It has been relatively less effort in the design of approximate multipliers. A multiplier usually consists of three stages: partial product generation, partial product accumulation and a carry propagation adder (CPA) at the final stage.





ISSN 2348 - 8034 Impact Factor- 4.022



FFT is a way to evaluate Discrete Fourier transform result as more quickly. Fourier analysis is conversion of signal from time domain to frequency domain. It reduce the complexity of evaluating the DFT from  $O(n^2)$  to  $O(n \log n)$ . The Cooley-tukey is a conquer algorithm that recursively breakdown a DFT. The N-point DFT of an input sequence x[n] is defined as,

$$\begin{aligned} \mathbf{X}(\mathbf{k}) &= \sum_{n=0}^{N-1} \mathbf{x}(n) \mathbf{W}_{\mathrm{N}}^{\mathrm{kn}} & (1) \\ \mathbf{W}_{\mathrm{N}}^{\mathrm{kn}} &= \mathrm{e}^{\mathrm{j} 2 \pi \mathrm{kn} / \mathrm{N}} & (2) \\ &n &= 0, 1, 2, 3, \dots, \mathrm{N-1} \end{aligned}$$

The serial commutator (SC) FFT is based on the circuits for bit-dimension permutation of serial data. It habituates a novel data management. This Process is based on radix-2 FFT algorithm and only half of the samples at each stage must be rotated.

[12] In DSP system, rotation of vectors through fixed and known angles has wide applications in robotics, digital signal processing, graphics, games, and animation. But, CORDIC value design for vector-rotation through specific angle is not defined. It defines the optimization method and CORDIC circuits for fixed and known rotations with different levels of accuracy. To reduce the area and time complexities, a hard wired pre-shifting method in barrel-shifters. Two devoted CORDIC cells are habituated for the fixed-angle rotations. Micro-rotations and scaling are interleaved and it is implemented in two various stages. Pipelined methods are suggested to cascading devoted single-rotation units and bi-rotation .CORDIC units are used for high throughput and reduced latency.

[8] In signal processing, there are two algorithm used to achieve less area and high throughput.First algorithm eliminates ROM and it require only low width barrel shifter.Second algorithm eliminates barrel shifter completely.It doesn't habituate normal angle.A small set of angle is used to realize the rotation by an angle in order to reduce size of barrel shifter.14 RAM blocks is accoustamated to storing.The efficacious CORDIC is applied for entire range of angles.It require 3 adder/subtractor 1046 flip-flop/latch.

12





## ISSN 2348 - 8034 Impact Factor- 4.022

[13] In FFT algorithm is implemented in many Digital Signal Processing (DSP) applications and hardware platforms for real-time application such as spectrum analysis, speech processing and filter. Filter coefficients are determined corresponding to the frequency of the filter. 128-point FFT is designed by applying the mixed-radix number representation to effectively reduce the number of additions and multiplications. The computational complexity of twiddle factor operations of FFT is deduced by habituating CORDIC module, to confine the multiplication operations to simple addition and shift operations.the conventional multiplier required in the realization of twiddle factor multiplications in existing architecture is replaced by CORDIC to achieve further reduction in hardware. data paths are redirected in order to reuse the major arithmetic blocks, thus significantly reducing the hardware cost. [3] Anumol B. Chennattucherry, Diego James (2013) have proposed the Fast Fourier Transform (FFT) is widely used in digital signal processing algorithm. The advent of semiconductor processing technology in VLSI system, different approaches had been tried in order to optimize the algorithm for a variety of parameters such as area, power and speed. FFT block which is capable of computing N point FFT based on Radix -2 Decimation-In-Time (DIT) architecture with carry select adder (CSLA) and a gate level modification to CSLA. The adder circuit is used in the butterfly structure is the carry select adder which consists of two Ripple Carry Adders (RCA) with C<sub>in</sub>=0 and  $C_{in}=1$  and a Multiplexer. The problem in CSLA design is not area efficient because it uses multiple pairs of RCA to generate partial sum and carry by considering carry input and the final sum and carry are selected by the mux. In order to use of Binary to Excess-1 Converter (BEC) instead of RCA with C<sub>in</sub> =1 in the regular CSLA. Implementation Radix-2 N point FFT in hardware using hardware language (VHDL). The overall area and power are reduced. The advantage of BEC logic comes from the lesser number of logic gates than the n-bit Full Adder (FA) structure. CSLA architecture have low area, low power, so it is simple and efficient for VLSI hardware implementation.

[4] Cong Liu, Jie Han (2015) have proposed the approximate circuits for error-tolerant applications, that can tolerate some loss of accuracy with improved performance and energy efficiency. Multipliers are key arithmetic circuits in many such applications such as digital signal processing (DSP). A novel approximate multiplier design is used to achieve a simple and fast approximate adder. It has a critical path delay that is even shorter than a conventional onebit full adder. The adder simultaneously computes the sum and generates an error signal this feature is employed to reduce the error in the multiplier. This multiplier leverages is designed approximate adder that limits its carry propagation to the nearest neighbors for fast partial product accumulation. Different levels of accuracy can be achieved through a configurable error recovery by using different numbers of most significant bits (MSB) for error reduction. The approximate adders is used for partial product accumulation and the error signals are used to compensate the error for obtaining a better accuracy.

## II. BEC BASED BINARY FFT

The binary FFT habituates a novel data management. It is a new type of hardware architecture is known as Binary FFT. It it based on binary bit values for each iteration. The process is based on radix-2 FFT algorithm. Here, only half of the samples at each stage must be rotated. The binary version of FFT is presented for fast and efficient computation of signal data. In FFT computation ,Twiddle fcator play an impartant role. The twiddle factor angle value is computed by using CORDIC algorithm. So the angle value function is inserted into the FFT verilog program which is named as Binary FFT. Then, rest of the system uses some algorithm and computing methods for other basic arithmetic operation in FFT structure. So ,the basic arithmetic operations are computed technically in the system which is combined with Binary FFT.

#### **Binary FFT**

The cascaded pipelined circuit for this class of problem which is faster and involves less area delay complexity. The order of arrival is from top to bottom. All the stages, consecutive samples are operated together in a butterfly. The angle values are computed using the CORDIC formulae. That values are converted into binary form and it is fed to the serial commutator which execute Real and Imaginary value for twiddle factor.

13





#### Booth multiplication

Product and summation are the basic and important process in FFT system. During the computation it require more time, require more space storage, and reduce the speed. In modifiedMultipliers are more complex than adders and subtractors. The speed of a multiplier usually determines the operating speed of a DSP system. Booth's algorithm examines adjacent pairs of bits of the N bit multiplier Y signed two's complement representation, including an implicit bit below the Least Significant Bit (LSB), y-1 = 0. The inadequate technique is used when the multiplicand is the most negative number. Booth multiplication reduces the number of additions for intermediate results. Positive and negative numbers treated alike., the generation of recoded multiplicands.

#### **Booth Encoding**

A positive multiplier consist of a block of 1s surrounded by 0s. M is the multiplicand. The number of oerations can be reduced to two. Any sequence of 1s in a binary number can be broken into the difference of two binary numbers.

| C <sub>K</sub> | S <sub>K</sub> |  |
|----------------|----------------|--|
| 000            | 0              |  |
| 001            | +1             |  |
| 010            | +1             |  |
| 011            | +2             |  |
| 100            | -2             |  |
| 101            | -1             |  |
| 110            | -1             |  |
| 111            | 0              |  |

The multiplication can be replaced by the string of 1s in the original number by simple operation, Adding the multiplier, Shifting the partial product thus formed by appropriate places and finally subtract the multiplier. It can be extended to any number of 1s in a multiplier. The encoding values are based on 2 bit, 3 bit and so on. If the multiplier value bit is high, it will be encoded using the binary encoding table.



14

(C)Global Journal Of Engineering Science And Researches



#### [*Princy*, 4(6): June 2017] DOI- 10.5281/zenodo.802175 Carry Select Adder (CSA) with BEC

## ISSN 2348 - 8034 Impact Factor- 4.022

The problem of Ripple Carry Adder (RCA) is each addwr has to wait for the arrival of its carry input signal before the actual addition can start. The carry Select Adder (CSA) is to use blocks of two riple carry adder, one of which is fed with constant 0 carry in while the other is fed with a constant 1 carry in. Both the block can be calculate in parallel. When the actual carry-in signal for the block arrives, multiplexers are used to select the correct one of both precalculated partial sums. The resulting carry out is selected and propagated to the next carry select block.

The carry propagation time through an n-bit adder block is reduced from O(n) to the number of stages times the delay of the multiplexer. Using n blocks of 1bit Carry select adders would incur a complexity of n Multiplexers. The advantage is only realized blocking the computation, and calculating parallel paths in each block the carry has the particular value. Each block can do its sum and have the stable result ready to wait for carry input from the preceeding blocks.



 $X0 = -B0 \tag{3}$ 

 $X1 = B0 \wedge B1 \tag{4}$ 

 $X2 = B2 \wedge (B0 \& B1)$  (5)





Excess 1 Logic **Binary Logic** X0, X1, X2 B0, B1, B2 000 001 001 010 010 011 011 100 100 101 101 110 110 111 000 111

Table 2.2 Truth Table for Binary to Excess 1 Code

**ISSN 2348 - 8034** 

Impact Factor- 4.022

Carry Select Adder uses single Ripple Carry Adder (RCA) for Cin = 0 Binary Excess Code (BEC) for Cin = 1. Using BEC with CSA is to obtain a reduced area and power consumption. Binary to Excess-1 converter is used to add 1 to the input numbers. One of the RCA is repliced with BEC because it requires less number of logic gates for its implementation so the area of the circuit is less.



Figure 2.3 Carry Select Adder with BEC

The FFT is processed in two different function. The angle computation part is stable for both the systems.

- 1) FFT using Booth multiplication
- 2) FFT using Carry Select adder with BEC



(C)Global Journal Of Engineering Science And Researches



## [Princy, 4(6): June 2017] DOI- 10.5281/zenodo.802175 III. EXPERIMENTAL RESULT

## ISSN 2348 - 8034 Impact Factor- 4.022

The suggested fault compensation circuit with fixed width multiplier are simulated by using Xilinx ISE 12.1 and enforced in spartan-6 FPGA processor. During the execution, the parameters are taken from the synthesis report. The input is purely based on 0's and 1's. The experimental results are shown in table.

| S.No. | Parameters   | Binary FFT | Binary FFT with<br>Booth Multiplier | Binary FFT<br>with Carry<br>Select Adder |  |
|-------|--------------|------------|-------------------------------------|------------------------------------------|--|
| 1.    | No.of.Slices | 75         | 70                                  | 59                                       |  |
| 2.    | No.of.LUTs   | 145        | 119                                 | 9                                        |  |
| 3.    | No.of.IOBs   | 256        | 19                                  | 19                                       |  |

Table 3.1 Parameter Analysis

#### **Performance Analysis**

The performance analysis is represented in Column chart method. There is a considerable reduction in time and area based on the implementation results which have been done by using Spartan-3 processor. The proposed algorithm is significantly reduces area consumption when compared to the presented system.



Figure 3.1 Performance Analysis

#### **Simulation Results**

In the below diagram shows that the output of the encoded multiplier and multiplicand. A,b is the input and z is a output. Here, manual computation is avoided.





ISSN 2348 - 8034 Impact Factor- 4.022

| <b>10.5261/201000.002175</b> |           |   | <br>Inipatt ratio1-4.02 |  |           |  |
|------------------------------|-----------|---|-------------------------|--|-----------|--|
| 🕨 📑 a[4:0]                   | 00111     |   |                         |  | 00111     |  |
| 🕨 📑 b[4:0]                   | 00101     |   |                         |  | 00101     |  |
| ▶ 📷 z[8:0]                   | 000100011 |   |                         |  | 000100011 |  |
| ▶ 🍯 tem1[3:0]                | 0011      |   |                         |  | 0011      |  |
| ▶ 🍯 tem2[3:0]                | 0000      | , |                         |  | 0000      |  |
| ▶ 🍯 x[4:-1]                  | 001110    |   |                         |  | 001110    |  |
| ▶ 🍯 y[4:-1]                  | 001010    |   |                         |  | 001010    |  |
| 🕨 🍯 u[4:0]                   | 00111     |   |                         |  | 00111     |  |

Figure 3.2 Booth multiplication output

In the below diagram, shows that the summation value which is used in Carry Select Adder to obtain complexity reduction and area is reduced. This summation value plays an important role in Carry Select Adder. Each terms represent the real and imaginary value of the addition values in FFT.

| l <b>+</b>     | 28  | 28  |  |  |
|----------------|-----|-----|--|--|
| 💶 🤣 /fft/y 10  | -4  | -4  |  |  |
| . <b>.</b>     | 12  | 12  |  |  |
| 😐 - 🥎 /fft/y20 | -4  | -4  |  |  |
| 💶 🤣 /fft/y21   | 4   | 4   |  |  |
| . <b></b>      | -4  | -4  |  |  |
| 😐 - 🍫 /fft/y31 | 2   | 2   |  |  |
| 😐 - 🥎 /fft/y4  | -4  | -4  |  |  |
| 😐 - 🧇 /fft/y50 | -4  | -4  |  |  |
| 😐 - 🤣 /fft/y51 | -2  | -2  |  |  |
| . <b>.</b>     | -4  | -4  |  |  |
| . <b>.</b>     | 4   | 4   |  |  |
|                | -4  | -4  |  |  |
| <b>⊥</b>       | -10 | -10 |  |  |
| - A            |     |     |  |  |

Figure34.3 Carry Select Adder Output

From the Figure shows that adding values which is applied to compute the final addition value of the Carry Select Adder. It is obtained by using Carry select Adder.





## **IV.** CONCLUSION

The presentation is based on Binary FFT architecture with Arithmetic function computing unit. It is the first FFT architecture used to compute the value of output sequencs using CORDIC algorithm a which is combined with arithmetic computing unit. It creates a data management that allows for using the theoretical minimum amount of hardware resources for a Binary FFT. Each and every moment of the FFT is the combination of the Basic arithmetic operations including twiddle factor. That basic operations are computed using Booth multiplication and Carry Select Adder with BEC in FFT. By using the CORDIC algorithm, the inbetween Multiplication and addition process is computed. It must be easy, achieve less number of iteration. It achieves less computation time, it occupies less area, computation speed is high. Hence, it is a efficacious method when compared to previous methods. A solution for natural I/O has also been presented, which offers comparable results to previous natural I/O FFT. The experimental results are obtained to verify the architecture, leading to small area and low power consumption.

#### REFERENCES

- 1. Akshata U., Hameem Shanavas I., Gopika D. (2013), 'Hardware Implementation of DIT FFT', MIT International Journal of Electronics and Communication Engineering, Vol. 3, No. 1.
- Amjadha A., Konguvel E., Raja J. (2014), 'Design of Multipath Delay Commutator Architecture Based FFT Processor for 4<sup>th</sup> Generation System', International Journal of Computer Applications, Volume 89 -No 12.
- 3. Anumol B. Chennattucherry, Diego James (2013), 'Optimized Memristor-Based Multipliers', IEEE Transactions on Circuits and Systems–I: regular papers, VOL. 64, NO. 2.
- 4. Cong Liu, Jie Han, Fabrizio Lombardi (2013), 'A Low-Power, High-Performance Approximate Multiplier with Configurable Partial Error Recovery', MIT International Journal of Electronics and Communication Engineering, Vol. 3, No. 1.
- 5. Honglan Jiang, Jie Han, Fei Qiao (2016), 'Approximate Radix-8 Booth Multipliers for Low-Power and High-Performance Operation', IEEE Transactions on Computers, VOL. 65, NO. 8.
- 6. Jampa Shravani, Siva Nagi Reddy K. (2014) 'Area Efficient Pipelined Radix-2<sup>k</sup> Feedforward FFT Using Booth Multiplier', International Journal of VLSI System ISSN 2322-0929, Vol.02.
- 7. Jesus Garcia, Juan Michell A., Angel Buron M. (1999), 'VLSI Configurable Delay Commutator for Pipeline Split Radix FFT Architecture', IEEE Transactions on Signal Processing, VOL. 47, NO.11.
- 8. Joseph E., Rajagopal A., Karibasappa K. (2012), 'FPGA implementation of Radix-2 FFT processor based on Radix-4 CORDIC', International Conference on signal processing.





[Princy, 4(6): June 2017]

## DOI- 10.5281/zenodo.802175

- 9. Jun Ma, Keshab Parhi K. (2004), 'Pipelined CORDIC-Based State Space Orthogonal Recursive Digital Filters Using Matrix Look-Ahead', IEEE Transactions on signal processing.
- 10. Leena Vachhani, Sridharan K., Pramod Meher K. (2008), 'Efficient CORDIC Algorithms and Architectures for Low Area and High Throughput Implementation', IEEE Transactions on Computers, 57(8):1148–1152.
- 11. Mario Garrido, Jesus Grajal, Oscar Gustafsson (2011), 'Optimum Circuits for Bit Reversal', IEEE Transaction on circuits and system II. Vol. 58, NO.10.
- 12. Paramasivam C., Jeyanthi K.B. (2013), 'Modified Scaling Free CORDIC based Pipeline MDC FFT&IFFT for Radix 2<sup>2</sup> Algorithm', IEEE Transaction on VLSI.
- 13. Pramod Kumar Meher, Sang Yoon Park (2013) 'CORDIC Design for Fixed Angle Rotation', IEEE Transactions on Very Large Scale Integration Systems, VOL. 21, NO. 2.
- 14. Sandhya G., Syed Inthiyaz, Fazal Noor Basha (2012), 'Power and Optimization for Pipelined CORDIC Architecture in VLSI', International Journal of Engineering Research and Applications Vol. 2, page.1588-1595.
- 15. Yasodhai A., Ramprasad A.V. (2014) 'An Efficient implementation of
- 16. rotational radix-4 CORDIC based FFT processor', Intelligent Computational Systems (RAICS), IEEE Recent Advances.



## ISSN 2348 – 8034 Impact Factor- 4.022